{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Performance Comparison &mdash; pandas Versus RAPIDS cuDF\n",
    "\n",
    "This tutorial uses `timeit` to compare performance benchmarks with pandas and RAPIDS cuDF."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"rapids-conda-envir-setup\"></a>\n",
    "## Setting Up a RAPIDS conda Environment with cuDF and cuML"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use the cuDF and cuML RAPIDS libraries, you need to create a RAPIDS conda environment and run this notebook with the python kernel.\n",
    "For example, use the following command to create a RAPIDS conda environment named `rapids` with rapids version 0.14 and python 3.7:\n",
    "\n",
    "```sh\n",
    "conda create -n rapids -c rapidsai -c nvidia -c anaconda -c conda-forge -c defaults ipykernel rapids=0.14 python=3.7 cudatoolkit=10.1\n",
    "```\n",
    "\n",
    "After that, make sure to open this notebook with the kernel named `conda-rapids`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## System Details"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "==============NVSMI LOG==============\n",
      "\n",
      "Timestamp                           : Thu Jul  2 15:45:49 2020\n",
      "Driver Version                      : 440.31\n",
      "CUDA Version                        : 10.2\n",
      "\n",
      "Attached GPUs                       : 1\n",
      "GPU 00000000:00:1E.0\n",
      "    Product Name                    : Tesla V100-SXM2-16GB\n",
      "    Product Brand                   : Tesla\n",
      "    Display Mode                    : Enabled\n",
      "    Display Active                  : Disabled\n",
      "    Persistence Mode                : Enabled\n",
      "    Accounting Mode                 : Disabled\n",
      "    Accounting Mode Buffer Size     : 4000\n",
      "    Driver Model\n",
      "        Current                     : N/A\n",
      "        Pending                     : N/A\n",
      "    Serial Number                   : 0323617005627\n",
      "    GPU UUID                        : GPU-43bd4553-f5b7-55ab-0633-ecba7c3a64d5\n",
      "    Minor Number                    : 0\n",
      "    VBIOS Version                   : 88.00.4F.00.09\n",
      "    MultiGPU Board                  : No\n",
      "    Board ID                        : 0x1e\n",
      "    GPU Part Number                 : 900-2G503-0000-000\n",
      "    Inforom Version\n",
      "        Image Version               : G503.0201.00.03\n",
      "        OEM Object                  : 1.1\n",
      "        ECC Object                  : 5.0\n",
      "        Power Management Object     : N/A\n",
      "    GPU Operation Mode\n",
      "        Current                     : N/A\n",
      "        Pending                     : N/A\n",
      "    GPU Virtualization Mode\n",
      "        Virtualization Mode         : Pass-Through\n",
      "        Host VGPU Mode              : N/A\n",
      "    IBMNPU\n",
      "        Relaxed Ordering Mode       : N/A\n",
      "    PCI\n",
      "        Bus                         : 0x00\n",
      "        Device                      : 0x1E\n",
      "        Domain                      : 0x0000\n",
      "        Device Id                   : 0x1DB110DE\n",
      "        Bus Id                      : 00000000:00:1E.0\n",
      "        Sub System Id               : 0x121210DE\n",
      "        GPU Link Info\n",
      "            PCIe Generation\n",
      "                Max                 : 3\n",
      "                Current             : 3\n",
      "            Link Width\n",
      "                Max                 : 16x\n",
      "                Current             : 16x\n",
      "        Bridge Chip\n",
      "            Type                    : N/A\n",
      "            Firmware                : N/A\n",
      "        Replays Since Reset         : 0\n",
      "        Replay Number Rollovers     : 0\n",
      "        Tx Throughput               : 0 KB/s\n",
      "        Rx Throughput               : 2000 KB/s\n",
      "    Fan Speed                       : N/A\n",
      "    Performance State               : P0\n",
      "    Clocks Throttle Reasons\n",
      "        Idle                        : Active\n",
      "        Applications Clocks Setting : Not Active\n",
      "        SW Power Cap                : Not Active\n",
      "        HW Slowdown                 : Not Active\n",
      "            HW Thermal Slowdown     : Not Active\n",
      "            HW Power Brake Slowdown : Not Active\n",
      "        Sync Boost                  : Not Active\n",
      "        SW Thermal Slowdown         : Not Active\n",
      "        Display Clock Setting       : Not Active\n",
      "    FB Memory Usage\n",
      "        Total                       : 16160 MiB\n",
      "        Used                        : 0 MiB\n",
      "        Free                        : 16160 MiB\n",
      "    BAR1 Memory Usage\n",
      "        Total                       : 16384 MiB\n",
      "        Used                        : 2 MiB\n",
      "        Free                        : 16382 MiB\n",
      "    Compute Mode                    : Default\n",
      "    Utilization\n",
      "        Gpu                         : 0 %\n",
      "        Memory                      : 0 %\n",
      "        Encoder                     : 0 %\n",
      "        Decoder                     : 0 %\n",
      "    Encoder Stats\n",
      "        Active Sessions             : 0\n",
      "        Average FPS                 : 0\n",
      "        Average Latency             : 0\n",
      "    FBC Stats\n",
      "        Active Sessions             : 0\n",
      "        Average FPS                 : 0\n",
      "        Average Latency             : 0\n",
      "    Ecc Mode\n",
      "        Current                     : Enabled\n",
      "        Pending                     : Enabled\n",
      "    ECC Errors\n",
      "        Volatile\n",
      "            Single Bit            \n",
      "                Device Memory       : 0\n",
      "                Register File       : 0\n",
      "                L1 Cache            : 0\n",
      "                L2 Cache            : 0\n",
      "                Texture Memory      : N/A\n",
      "                Texture Shared      : N/A\n",
      "                CBU                 : N/A\n",
      "                Total               : 0\n",
      "            Double Bit            \n",
      "                Device Memory       : 0\n",
      "                Register File       : 0\n",
      "                L1 Cache            : 0\n",
      "                L2 Cache            : 0\n",
      "                Texture Memory      : N/A\n",
      "                Texture Shared      : N/A\n",
      "                CBU                 : 0\n",
      "                Total               : 0\n",
      "        Aggregate\n",
      "            Single Bit            \n",
      "                Device Memory       : 4\n",
      "                Register File       : 0\n",
      "                L1 Cache            : 0\n",
      "                L2 Cache            : 0\n",
      "                Texture Memory      : N/A\n",
      "                Texture Shared      : N/A\n",
      "                CBU                 : N/A\n",
      "                Total               : 4\n",
      "            Double Bit            \n",
      "                Device Memory       : 0\n",
      "                Register File       : 0\n",
      "                L1 Cache            : 0\n",
      "                L2 Cache            : 0\n",
      "                Texture Memory      : N/A\n",
      "                Texture Shared      : N/A\n",
      "                CBU                 : 0\n",
      "                Total               : 0\n",
      "    Retired Pages\n",
      "        Single Bit ECC              : 0\n",
      "        Double Bit ECC              : 0\n",
      "        Pending Page Blacklist      : No\n",
      "    Temperature\n",
      "        GPU Current Temp            : 40 C\n",
      "        GPU Shutdown Temp           : 90 C\n",
      "        GPU Slowdown Temp           : 87 C\n",
      "        GPU Max Operating Temp      : 83 C\n",
      "        Memory Current Temp         : 37 C\n",
      "        Memory Max Operating Temp   : 85 C\n",
      "    Power Readings\n",
      "        Power Management            : Supported\n",
      "        Power Draw                  : 25.67 W\n",
      "        Power Limit                 : 300.00 W\n",
      "        Default Power Limit         : 300.00 W\n",
      "        Enforced Power Limit        : 300.00 W\n",
      "        Min Power Limit             : 150.00 W\n",
      "        Max Power Limit             : 300.00 W\n",
      "    Clocks\n",
      "        Graphics                    : 135 MHz\n",
      "        SM                          : 135 MHz\n",
      "        Memory                      : 877 MHz\n",
      "        Video                       : 555 MHz\n",
      "    Applications Clocks\n",
      "        Graphics                    : 1312 MHz\n",
      "        Memory                      : 877 MHz\n",
      "    Default Applications Clocks\n",
      "        Graphics                    : 1312 MHz\n",
      "        Memory                      : 877 MHz\n",
      "    Max Clocks\n",
      "        Graphics                    : 1530 MHz\n",
      "        SM                          : 1530 MHz\n",
      "        Memory                      : 877 MHz\n",
      "        Video                       : 1372 MHz\n",
      "    Max Customer Boost Clocks\n",
      "        Graphics                    : 1530 MHz\n",
      "    Clock Policy\n",
      "        Auto Boost                  : N/A\n",
      "        Auto Boost Default          : N/A\n",
      "    Processes                       : None\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!nvidia-smi -q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Benchmark Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Installations\n",
    "\n",
    "Install v3io-generator to create a 1 GB data set for the benchmark.<br>\n",
    "You only need to run the generator once, and then you can reuse the generated data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://test.pypi.org/simple/\n",
      "Collecting v3io-generator\n",
      "  Downloading https://test-files.pythonhosted.org/packages/6c/f6/ba9045111de98747af2c94e10f3dbf74311e6bd3a033c7ea1ca84e084e82/v3io_generator-0.0.27.dev0-py3-none-any.whl (9.3 kB)\n",
      "Installing collected packages: v3io-generator\n",
      "Successfully installed v3io-generator-0.0.27.dev0\n",
      "Collecting faker\n",
      "  Using cached Faker-4.1.1-py3-none-any.whl (1.0 MB)\n",
      "Requirement already satisfied: python-dateutil>=2.4 in /User/.conda/envs/rapids/lib/python3.7/site-packages (from faker) (2.8.1)\n",
      "Collecting text-unidecode==1.3\n",
      "  Using cached text_unidecode-1.3-py2.py3-none-any.whl (78 kB)\n",
      "Requirement already satisfied: six>=1.5 in /User/.conda/envs/rapids/lib/python3.7/site-packages (from python-dateutil>=2.4->faker) (1.15.0)\n",
      "Installing collected packages: text-unidecode, faker\n",
      "Successfully installed faker-4.1.1 text-unidecode-1.3\n",
      "Collecting pytimeparse\n",
      "  Using cached pytimeparse-1.1.8-py2.py3-none-any.whl (10.0 kB)\n",
      "Installing collected packages: pytimeparse\n",
      "Successfully installed pytimeparse-1.1.8\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install -i https://test.pypi.org/simple/ v3io-generator\n",
    "!{sys.executable} -m pip install faker\n",
    "!{sys.executable} -m pip install pytimeparse"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note:** You must **restart the Jupyter kernel** to complete the installation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import yaml\n",
    "import time\n",
    "import datetime\n",
    "import json\n",
    "import itertools\n",
    "\n",
    "# Generator\n",
    "from v3io_generator import metrics_generator, deployment_generator\n",
    "\n",
    "# Dataframes\n",
    "import cudf\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configurations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Benchmark configurations\n",
    "metric_names = ['cpu_utilization', 'latency', 'packet_loss', 'throughput']\n",
    "nlargest = 10\n",
    "source_file = os.path.join(os.getcwd(), 'data', 'ops.logs') # Use full path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the Data Source\n",
    "\n",
    "Use v3io-generator to create a time-series network-operations dataset for 100 companies, including 4 metrics (CPU utilization, latency, throughput, and packet loss).<br>\n",
    "Then, write the dataset to a JSON file to be used as the data source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>company</th>\n",
       "      <th>cpu_utilization</th>\n",
       "      <th>latency</th>\n",
       "      <th>packet_loss</th>\n",
       "      <th>throughput</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Williams_and_Sons</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Guerrero_Ltd</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Harris-Gutierrez</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shaw-Williams</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Harris_Inc</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             company  cpu_utilization  latency  packet_loss  throughput\n",
       "0  Williams_and_Sons                0        0            0           0\n",
       "1       Guerrero_Ltd                0        0            0           0\n",
       "2   Harris-Gutierrez                0        0            0           0\n",
       "3      Shaw-Williams                0        0            0           0\n",
       "4         Harris_Inc                0        0            0           0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a metadata factory\n",
    "dep_gen = deployment_generator.deployment_generator()\n",
    "faker=dep_gen.get_faker()\n",
    "\n",
    "# Design the metadata\n",
    "dep_gen.add_level(name='company',number=100,level_type=faker.company)\n",
    "\n",
    "# Generate a deployment structure\n",
    "deployment_df = dep_gen.generate_deployment()\n",
    "\n",
    "# Initialize the metric values\n",
    "for metric in metric_names:\n",
    "    deployment_df[metric] = 0\n",
    "\n",
    "deployment_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Specify metrics configuration for the generator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "metrics_configuration = yaml.safe_load(\"\"\"\n",
    "errors: {length_in_ticks: 50, rate_in_ticks: 150}\n",
    "timestamps: {interval: 5s, stochastic_interval: false}\n",
    "metrics:\n",
    "  cpu_utilization:\n",
    "    accuracy: 2\n",
    "    distribution: normal\n",
    "    distribution_params: {mu: 70, noise: 0, sigma: 10}\n",
    "    is_threshold_below: true\n",
    "    past_based_value: false\n",
    "    produce_max: false\n",
    "    produce_min: false\n",
    "    validation:\n",
    "      distribution: {max: 1, min: -1, validate: false}\n",
    "      metric: {max: 100, min: 0, validate: true}\n",
    "  latency:\n",
    "    accuracy: 2\n",
    "    distribution: normal\n",
    "    distribution_params: {mu: 0, noise: 0, sigma: 5}\n",
    "    is_threshold_below: true\n",
    "    past_based_value: false\n",
    "    produce_max: false\n",
    "    produce_min: false\n",
    "    validation:\n",
    "      distribution: {max: 1, min: -1, validate: false}\n",
    "      metric: {max: 100, min: 0, validate: true}\n",
    "  packet_loss:\n",
    "    accuracy: 0\n",
    "    distribution: normal\n",
    "    distribution_params: {mu: 0, noise: 0, sigma: 2}\n",
    "    is_threshold_below: true\n",
    "    past_based_value: false\n",
    "    produce_max: false\n",
    "    produce_min: false\n",
    "    validation:\n",
    "      distribution: {max: 1, min: -1, validate: false}\n",
    "      metric: {max: 50, min: 0, validate: true}\n",
    "  throughput:\n",
    "    accuracy: 2\n",
    "    distribution: normal\n",
    "    distribution_params: {mu: 250, noise: 0, sigma: 20}\n",
    "    is_threshold_below: false\n",
    "    past_based_value: false\n",
    "    produce_max: false\n",
    "    produce_min: false\n",
    "    validation:\n",
    "      distribution: {max: 1, min: -1, validate: false}\n",
    "      metric: {max: 300, min: 0, validate: true}\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the data according to the given hierarchy and metrics configuration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Saving generated data to: /User/data-ingestion-and-preparation/data/ops.logs\n"
     ]
    }
   ],
   "source": [
    "met_gen = metrics_generator.Generator_df(metrics_configuration, \n",
    "                                         user_hierarchy=deployment_df, \n",
    "                                         initial_timestamp=time.time())\n",
    "\n",
    "metrics = met_gen.generate_range(start_time=datetime.datetime.now(),\n",
    "                                 end_time=datetime.datetime.now()+datetime.timedelta(hours=62),\n",
    "                                 as_df=True,\n",
    "                                 as_iterator=False)\n",
    "\n",
    "# Verify that the source-file parent directory exists.\n",
    "os.makedirs(os.path.dirname(source_file), exist_ok=1)\n",
    "\n",
    "print(f'Saving generated data to: {source_file}')\n",
    "\n",
    "# Generate file from metrics\n",
    "with open(source_file, 'w') as f:\n",
    "    metrics_batch = metrics\n",
    "    metrics_batch.to_json(f,\n",
    "                          orient='records',\n",
    "                          lines=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Validate the Target File Size\n",
    "\n",
    "Get the target size for the test file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1207964564"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "Path(source_file).stat().st_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['{\"company\":\"Williams_and_Sons\",\"cpu_utilization\":64.6440138248,\"cpu_utilization_is_error\":false,\"latency\":2.9965630871,\"latency_is_error\":false,\"packet_loss\":0.0,\"packet_loss_is_error\":false,\"throughput\":258.7732213917,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Guerrero_Ltd\",\"cpu_utilization\":68.5296690547,\"cpu_utilization_is_error\":false,\"latency\":0.0,\"latency_is_error\":false,\"packet_loss\":0.0,\"packet_loss_is_error\":false,\"throughput\":288.8039306559,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Harris-Gutierrez\",\"cpu_utilization\":55.8557277251,\"cpu_utilization_is_error\":false,\"latency\":1.7068227314,\"latency_is_error\":false,\"packet_loss\":1.6544231936,\"packet_loss_is_error\":false,\"throughput\":265.4031916784,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Shaw-Williams\",\"cpu_utilization\":72.8668610421,\"cpu_utilization_is_error\":false,\"latency\":1.6477141418,\"latency_is_error\":false,\"packet_loss\":0.8709185994,\"packet_loss_is_error\":false,\"throughput\":237.5182913153,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Harris_Inc\",\"cpu_utilization\":83.5172325497,\"cpu_utilization_is_error\":false,\"latency\":7.8220358909,\"latency_is_error\":false,\"packet_loss\":1.3942153104,\"packet_loss_is_error\":false,\"throughput\":274.9563709951,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Johnson__Smith_and_Lewis\",\"cpu_utilization\":65.3007890236,\"cpu_utilization_is_error\":false,\"latency\":9.012152204,\"latency_is_error\":false,\"packet_loss\":0.0,\"packet_loss_is_error\":false,\"throughput\":247.0685516947,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Banks-Young\",\"cpu_utilization\":80.0440916828,\"cpu_utilization_is_error\":false,\"latency\":7.304937434,\"latency_is_error\":false,\"packet_loss\":2.1692271547,\"packet_loss_is_error\":false,\"throughput\":279.7641913689,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Gonzalez_Group\",\"cpu_utilization\":71.4195844054,\"cpu_utilization_is_error\":false,\"latency\":0.0,\"latency_is_error\":false,\"packet_loss\":0.0,\"packet_loss_is_error\":false,\"throughput\":260.0017327497,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Moore-Guerrero\",\"cpu_utilization\":65.0205705374,\"cpu_utilization_is_error\":false,\"latency\":1.7684290753,\"latency_is_error\":false,\"packet_loss\":0.0,\"packet_loss_is_error\":false,\"throughput\":266.4209778666,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n', '{\"company\":\"Sanchez__Bennett_and_Thompson\",\"cpu_utilization\":67.2085370307,\"cpu_utilization_is_error\":false,\"latency\":4.5898304002,\"latency_is_error\":false,\"packet_loss\":0.0,\"packet_loss_is_error\":false,\"throughput\":274.830056152,\"throughput_is_error\":false,\"timestamp\":1593707325519}\\n']\n"
     ]
    }
   ],
   "source": [
    "with open(source_file) as myfile:\n",
    "    head = [next(myfile) for x in range(10)]\n",
    "print(head)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Benchmark\n",
    "\n",
    "The benchmark tests use the following flow:\n",
    "\n",
    "- Read file\n",
    "- Compute aggregations\n",
    "- Get the n-largest values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "benchmark_file = source_file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following examples, `timeit` is executed in a loop.<br>\n",
    "You can change the number of runs and loops:\n",
    "```\n",
    "%%timeit -n 1 -r 1\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test Load Times"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### cuDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.04 s ± 35.7 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1 -r 2\n",
    "gdf = cudf.read_json(benchmark_file, lines=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "36.7 s ± 202 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1 -r 2\n",
    "pdf = pd.read_json(benchmark_file, lines=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test Aggregation\n",
    "\n",
    "Load the files to memory to allow applying `timeit` only to the aggregations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "gdf = cudf.read_json(benchmark_file, lines=True)\n",
    "pdf = pd.read_json(benchmark_file, lines=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### cuDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "246 ms ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1 -r 7\n",
    "\n",
    "ggdf = gdf.groupby(['company']).\\\n",
    "            agg({k: ['min', 'max', 'mean'] for k in metric_names})\n",
    "raw_nlargest = gdf.nlargest(nlargest, 'cpu_utilization')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.82 s ± 38.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 1 -r 7\n",
    "\n",
    "gpdf = pdf.groupby(['company']).\\\n",
    "            agg({k: ['min', 'max', 'mean'] for k in metric_names})\n",
    "raw_nlargest = pdf.nlargest(nlargest, 'cpu_utilization')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
